{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字面量\n",
    "\n",
    "python 中单引号和双引号是一样的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('span\"\"m', \"It's\")"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'span\"\"m', \"It's\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**可以使用 `'''` 或 `\"\"\"` 创建字符串**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('aaaa', 'bbbb')"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'''aaaa''', \"\"\"bbbb\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**带有转义字符的字符串，会将 `\\` 看做转义标识**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a\\tb\\nc\\x00m'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"a\\tb\\nc\\0m\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**原始字符串，以 `r` 开头并不会将 `\\` 看做转义标识**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a\\\\b\\\\c'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r\"a\\b\\c\"  #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**bytes 字符串，，以 `b` 开头**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b'sp\\x01am'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b'sp\\x01am'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Unicode 字符串，以 `u` 或 `U` 开头**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'eggs spam'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "u'eggs\\u0020spam'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**多行字符串**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'今天天气正好，\\n        我们去打球吧。\\n这是个好主意'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = \"\"\"今天天气正好，\n",
    "        我们去打球吧。\n",
    "这是个好主意\"\"\"\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "今天天气正好，\n",
      "        我们去打球吧。\n",
      "这是个好主意\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "aaaa\n",
      "bbbb\n",
      "     ddddd\n",
      "    \n",
      "ccccc\n"
     ]
    }
   ],
   "source": [
    "print('''aaaa\n",
    "bbbb\n",
    "     ddddd\n",
    "    \n",
    "ccccc''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 转义特殊字符串\n",
    "\n",
    "在需要在字符中使用特殊字符时，python用反斜杠(\\)转义字符。如下表：\n",
    "\n",
    "转义字符 | 描述  | 示例\n",
    "---|---|---\n",
    "\\(在行尾时) | 续行符 | \n",
    "\\n    | 换行      |\n",
    "\\\\    | 反斜杠符号  |\n",
    "\\'    |   单引号    |\n",
    "\\\"    |   双引号   |\n",
    "\\a    | 响铃      |\n",
    "\\b    | 退格(Backspace) |\n",
    "\\f    |  换页  |\n",
    "\\r    |  回车  |\n",
    "\\t    |   横向制表符 |\n",
    "\\v    |  纵向制表符   |\n",
    "\\x    | 十六进制数，yy代表的字符，例如：\\x0a代表换行  |\n",
    "\\o    | 八进制数，yy代表的字符，例如：\\o12代表换行   |\n",
    "\\0    | 空       |\n",
    "\\u    |   16位 Unicode 字符   |\n",
    "\\U    |   32位 Unicode 字符         |\n",
    "\n",
    "> 原始字符串抑制转义"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a\\nb\\tc'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = 'a\\nb\\tc'\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "b\tc\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\x00'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'\\0'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7, 5)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(r'a\\nb\\tc'), len('a\\nb\\tc')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字符串运算符\n",
    "\n",
    "下表实例变量a值为字符串 \"Hello\"，b变量值为 \"Python\"：\n",
    "\n",
    "| 操作符 | 描述                                                         | 实例                            |\n",
    "| ------ | ------------------------------------------------------------ | ------------------------------- |\n",
    "| +      | 字符串连接                                                   | a + b 输出结果： HelloPython    |\n",
    "| *      | 重复输出字符串                                               | a*2 输出结果：HelloHello        |\n",
    "| []     | 通过索引获取字符串中字符                                     | a[1] 输出结果 **e**             |\n",
    "| [ : ]  | 截取字符串中的一部分                                         | a[1:4] 输出结果 **ell**         |\n",
    "| in     | 成员运算符 - 如果字符串中包含给定的字符返回 True             | **'H' in a** 输出结果 1         |\n",
    "| not in | 成员运算符 - 如果字符串中不包含给定的字符返回 True           | **'M' not in a** 输出结果 1     |\n",
    "| %      | 格式字符串                                                   | 请看下一节内容。                |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'abcdef'"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'abc' + 'def'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'NiNiNiNi'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'Ni' * 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "print('-' * 80)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "h a c k e r "
     ]
    }
   ],
   "source": [
    "myjob = \"hacker\"\n",
    "for c in myjob:\n",
    "    print(c, end=' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**判断是否子串**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'k' in myjob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'z' in myjob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'spam' in 'abcspamdef'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 索引和切割\n",
    "\n",
    "S[i]: 抽取第 i 个字符\n",
    "\n",
    "S[i:j]: 抽取索引为 i 到 j 的子串（包含 i 不包含 j）\n",
    "\n",
    "S[i:]: 抽取索引为 i 之后的子串（包含 i）\n",
    "\n",
    "S[:j]:  抽取索引为 j 之前的子串（不包含 j）\n",
    "\n",
    "S[:]: 抽取整个字符串\n",
    "\n",
    "S[i:j:k]: 抽取索引为 i 到 j 的子串（包含 i 不包含 j），步长为 k\n",
    "\n",
    "S[::k]: 抽取整个字符串，步长为 k\n",
    "\n",
    "S[::-1]: 反向抽取整个字符串"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('s', 'a')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S = 'spam'\n",
    "S[0], S[-2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('pa', 'pam', 'spa')"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[1:3], S[1:], S[:-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('sa', 'sa', 'maps')"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[0:4:2], S[::2], S[::-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 字符串不可改变性\n",
    "\n",
    "任何操作都不能改变原始字符串对象本身，都是产生新的字符串对象。当我们用变量时，会有一种改变字符串对象的错觉，这时其实是将变量指向了一个新的字符串对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-25-7ffb2cc33b47>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mS\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'spam'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mS\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'x'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "S = 'spam'\n",
    "S[0] = 'x'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'spamSPAM!'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S = S + 'SPAM!'  # 并没有改变原来的 'spam' 字符串对象，而是产生了一个新的字符串对象，并将变量 S 指向新对象\n",
    "S"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('spxxxxSPAM!', 'spamSPAM!')"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.replace('am', 'xxxx'), S  # 字符串的任何操作都不能改变源字符串对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "更多例子。注意这里的 `spam` 和 `eggs` 在内存中存储是不一样的，感兴趣的可以 google 一下。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(bytes, str)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B = b'spam'\n",
    "S = 'eggs'\n",
    "type(B), type(S)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(b'spam', 'eggs')"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B, S"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(115, 'e')"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B[0], S[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(b'pam', 'ggs')"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B[1:], S[1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "([115, 112, 97, 109], ['e', 'g', 'g', 's'])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(B), list(S)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字符串转换\n",
    "\n",
    "## 数字转换\n",
    "\n",
    "和其他动态类型语言不同，python 中字符串对象和数值对象是不能直接做运算的，必须进行显示转换。这是 python 和其他动态类型语言不同的地方，它的类型动态性只是表现在变量上面，对于对象本身必须是特定类型，并且是强类型的。对于变量和对象的关系请看前面的文章。\n",
    "\n",
    "以下是一些实例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "must be str, not int",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-33-5d43c357ae74>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m'42'\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: must be str, not int"
     ]
    }
   ],
   "source": [
    "'42' + 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(42, '42')"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int('42'), str(42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'42'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "repr(42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "spam 'spam'\n"
     ]
    }
   ],
   "source": [
    "print(str('spam'), repr('spam'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里我们使用到了 `str` 和 `repr` 两个字符串函数，发现他们返回的东西几乎一样，其实他们是有差别的:\n",
    "\n",
    "* `str()`  的输出追求可读性，输出格式要便于理解，适合用于输出内容到用户终端。\n",
    "* `repr()` 的输出追求明确性，除了对象内容，还需要展示出对象的数据类型信息，适合开发和调试阶段使用。\n",
    "\n",
    "有兴趣的可以使用 `help` 方法查看两者的区别。\n",
    "\n",
    "以下是更多实例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "43"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int(\"42\") + 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'421'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"42\" + str(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1.234E-10', 1.234e-10)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"1.234E-10\", float(\"1.234E-10\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "B = '1101'\n",
    "I = 0\n",
    "while B!='':\n",
    "    I = I*2 + (ord(B[0]) - ord('0'))\n",
    "    B = B[1:]\n",
    "I"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(13, '0b1101')"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int('1101', 2), bin(13)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 字符编码转换\n",
    "\n",
    "对于单个字符可以使用 `ord()` 函数获取他的 ASCII 码值，一个 ASCII 码值使用 `chr()` 得到对于的字符。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(115, 's')"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ord('s'), chr(115)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将字符转换成 ASCII 码过后，可以进行算数运算。但是直接使用字符做算数运算会报错，这里和 C 语言是不一样的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'6'"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chr(ord('5')+1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ord('5') - ord('0')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for -: 'str' and 'str'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-45-f436e1546941>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m'5'\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;34m'0'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for -: 'str' and 'str'"
     ]
    }
   ],
   "source": [
    "'5' - '0'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字符串方法调用\n",
    "\n",
    "使用 `dir(\"\")` 查看字符串类型有哪些支持的内置方法。每个方法的具体使用方式再使用 `help()` 查看。这里只列举部分方法使用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 字符串长度\n",
    "len('abc')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['a', 'b', 'c']"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 分隔字符串\n",
    "'a,b,c'.split(',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a>b>c'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 字符串拼接\n",
    "'>'.join(['a','b','c'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字符串格式化\n",
    "\n",
    "1. 格式化表达式：'...%s...' %(values)\n",
    "2. 格式化方法调用：'...{}...'.format(values)\n",
    "\n",
    "格式化符号\n",
    "\n",
    "| 符   号 | 描述                                 |\n",
    "| ------- | ------------------------------------ |\n",
    "| %c      | 格式化字符及其ASCII码                |\n",
    "| %s      | 格式化字符串                         |\n",
    "| %r      | 和 s 类似，但是使用 repr 输出 |\n",
    "| %d      | 格式化整数                           |\n",
    "| %i      | 格式化整数                           |\n",
    "| %u      | 格式化无符号整型                     |\n",
    "| %o      | 格式化无符号八进制数                 |\n",
    "| %x      | 格式化无符号十六进制数               |\n",
    "| %X      | 格式化无符号十六进制数（大写）       |\n",
    "| %f      | 格式化浮点数字，可指定小数点后的精度 |\n",
    "| %F      | 和 f 类似，但是字母大写 |\n",
    "| %e      | 用科学计数法格式化浮点数             |\n",
    "| %E      | 作用同%e，用科学计数法格式化浮点数   |\n",
    "| %g      | %f和%e的简写                         |\n",
    "| %G      | %f 和 %E 的简写                      |\n",
    "| %p      | 用十六进制数格式化变量的地址         |\n",
    "| %%      | 输出 %         |\n",
    "\n",
    "\n",
    "\n",
    "> 基础格式化语法\n",
    ">\n",
    "> `%[(keyname)][flags][width][.precision]typecode`\n",
    ">\n",
    "> 高级格式化语法\n",
    ">\n",
    "> `{fieldname component !conversionflag :formatspec}`\n",
    ">\n",
    "> : formatspec: `[[fill]align][sign][#][0][width][,][.precision][typecode]`\n",
    "\n",
    "## 格式化表达式\n",
    "\n",
    "以下是一些实例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'integers: ...1234...1234  ...001234'"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = 1234\n",
    "'integers: ...%d...%-6d...%06d' % (x,x,x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.234568e+00 | 1.234568 |1.23457'"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = 1.23456789\n",
    "'%e | %f |%g' %(x,x,x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.234568E+00'"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'%E' % x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "更多格式化类型使用，请自行实践"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('1.23456789', '1.23456789')"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'%s' % x, str(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'a=1; b=bbbb'"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 基于字典的格式化\n",
    "\n",
    "'a=%(a)d; b=%(bb)s' % {'a': 1, 'bb':'bbbb'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`vars()` 函数返回所有的变量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.23456789 or eggs'"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'%(x)s or %(S)s' % vars()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 格式化方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'spam, orange, apple'"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "template = '{0}, {1}, {2}'\n",
    "template.format('spam', 'orange', 'apple')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'spam, orange, apple'"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "template = '{}, {}, {}'\n",
    "template.format('spam', 'orange', 'apple')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'spam, apple, orange'"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "template = '{a}, {b}, {c}'\n",
    "template.format(a='spam', c='orange', b='apple')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'orange, apple, spam'"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "template = '{a}, {b}, {0}'\n",
    "template.format('spam', a='orange', b='apple')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**添加关键字、属性和偏移量**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'My laptop runs linux'"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import sys\n",
    "'My {1[kind]} runs {0.platform}'.format(sys, {'kind': 'laptop'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'My laptop runs linux'"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'My {map[kind]} runs {sys.platform}'.format(sys=sys, map={'kind': 'laptop'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**高级格式化**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'9,999,999,999'"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'{:,d}'.format(9999999999)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# bytearray 操作\n",
    "\n",
    "对于 bytearray 这里列举一些使用的实例，不做深入介绍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bytearray(b'spam')"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S = 'spam'\n",
    "# C = bytearray(S)  # 此语句在2.6+正常，但是在 3.X 中会语法错误\n",
    "C = bytearray(S, 'latin1') \n",
    "C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bytearray(b'spam')"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "C = bytearray(b'spam')\n",
    "C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "# bytearray 可以修改\n",
    "C[0] = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "an integer is required",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-65-7c46a9aa2011>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# 但是修改必须是整数，不能是字符\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mC\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'1'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: an integer is required"
     ]
    }
   ],
   "source": [
    "# 但是修改必须是整数，不能是字符\n",
    "C[0] = '1'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bytearray(b'1Yam')"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "C[0] = ord('1')\n",
    "C[1] = b'Y'[0]\n",
    "C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bytearray(b'1Yam')"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 模式匹配\n",
    "\n",
    "模式匹配也叫正则表达式，也是字符串处理的一种方式，由于其强大功能，在实际项目中被普遍使用。现在不论是什么语言都有提供模式匹配的功能，并且在正则表达式的功能和语法上都打通大同小异。由于其语言的独立性和重要性后面会单独讲解，这里为了字符串操作的完整性做简单的实例展示，不解释。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**查找匹配**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['abc', 'abc']\n",
      "['I have a dog', 'I have a cat']\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "s='123abc456eabc789'\n",
    "print(re.findall(r'abc',s))\n",
    "\n",
    "s = 'I have a dog , I have a cat'\n",
    "print(re.findall( r'I have a (?:dog|cat)' , s ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**使用compile加速**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['111', '222', '555']"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s='111,222,aaa,bbb,555,ccc333,444ddd'\n",
    "rule=r'\\b\\d+\\b'\n",
    "compiled_rule=re.compile(rule)\n",
    "compiled_rule.findall(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**匹配数字开头**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['123']"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s='123 456/n789 012/n345 678'\n",
    "rc=re.compile(r'^\\d+') \n",
    "rc.findall(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 小结\n",
    "\n",
    "一篇文章不可能把字符串的方方面面都包括进来，比如字符串在 python 的存储、不同字符集的转行等。同时刚开始我们也不能用到所有的这些知识内容，所以在刚开始重点应该放在实际编码操作和整体知识体系的建立上，等熟练了或后面遇到相关问题了，在深入研究某块内容。希望对大家能有所帮助。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
